Skip to content

Approximate Median-of-Medians in Select #92348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

JulianKnodt
Copy link
Contributor

@JulianKnodt JulianKnodt commented Dec 28, 2021

Previously, there was a cheap constant time approximation to median of medians which sampled a very small amount from fixed indices in the array, but it had O(n^2) worst case behaviour which was simple to craft worst cases for.
Instead, use a more general approximation median of medians algorithm. This is also constant time,
but is much cheaper than normal median of medians since it selects random chunks to sort and
only finds the medians of those. It only uses constant space, and in theory could be configured
better to handle small inputs to allocate less.

While it uses a weird LCG like thing, if there was a rand implementation in core, it could be used instead. While this is currently deterministic and it could be argued that that would lead to being able to craft adversarial cases, this is also true for the prior implementation.

Addresses #91644

I didn't include full benchmarks yet, because I can't get the third one to complete on my machine for the previous implementation, possibly because I'm running compute intensive jobs at the same time.

It probably makes sense to add more benchmarks for the average case, but here's the worst case provided in the original issue:

New:

test slice::quickselect_l1                                      ... bench:       1,754 ns/iter (+/- 2)
test slice::quickselect_l2                                      ... bench:       5,829 ns/iter (+/- 39)
test slice::quickselect_l3                                      ... bench:     530,171 ns/iter (+/- 10,829)

Old:

test slice::quickselect_l1                                      ... bench:         950 ns/iter (+/- 15)
test slice::quickselect_l2                                      ... bench:       7,375 ns/iter (+/- 44)
test slice::quickselect_l3                                      ... bench:     728,402 ns/iter (+/- 6,931)

@rust-highfive
Copy link
Contributor

r? @joshtriplett

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 28, 2021
@rust-log-analyzer

This comment has been minimized.

Previously, there was a cheap approximation to median of medians, but it lead to O(n^2) worst
case behaviour. Instead, use an approximate median of medians algorithm. This is linear time,
but is much cheaper than normal median of medians since it selects random chunks to sort and
only finds the medians of those. It only uses constant space, and in theory could be configured
better to handle small inputs to allocate less.
@joshtriplett
Copy link
Member

I don't think we should use something random and non-deterministic here; that would make behavior and runtime performance itself non-deterministic, and that's something we've had a lot of trouble with, especially when trying to benchmark.

@JulianKnodt
Copy link
Contributor Author

The algorithm provided is fully deterministic, since it uses an LCG seeded with the size of the array. In some sense, the old version was also "random" in that it selected arbitrary indices to select, but it was just more clear which indices it would select since they were at fixed distances based on the length of the array.

@joshtriplett
Copy link
Member

@JulianKnodt Ah, I see. Isn't that very nearly as easy to craft worst-cases for, then, if you know the length in advance?

@JulianKnodt
Copy link
Contributor Author

JulianKnodt commented Jan 15, 2022

If you're willing to put the effort in, I think it would still be possible, but it would require more work. I assume in practice it would hit the worst case less often, but as long as it's deterministic an adversarial case could be made

@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 27, 2022
@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 20, 2022
@JohnCSimon
Copy link
Member

@JulianKnodt
Ping from triage: I'm closing this due to inactivity, Please reopen when you are ready to continue with this. Thanks for your contribution.

@rustbot label: +S-inactive

@JohnCSimon JohnCSimon closed this Apr 11, 2022
@rustbot rustbot added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label Apr 11, 2022
@JulianKnodt
Copy link
Contributor Author

@JohnCSimon I'm not sure what more there is to do for this PR, as I wasn't able to understand exactly what would make this feature more ready for being merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants